Skip to content

Add Infection + PHPStan CI check on PHP 8.3 and fix PHP 7.x template serialization#145

Open
Copilot wants to merge 13 commits into
masterfrom
copilot/integrate-infection-static-analysis
Open

Add Infection + PHPStan CI check on PHP 8.3 and fix PHP 7.x template serialization#145
Copilot wants to merge 13 commits into
masterfrom
copilot/integrate-infection-static-analysis

Conversation

Copy link
Copy Markdown

Copilot AI commented May 2, 2026

  • Inspect existing serialization regression coverage and identify the best place for the new edge-case tests
  • Add focused regression tests for node-backed serialization edge cases (non-HTML tag, only p/div tag, invalid HTML, HTML5 implicit paragraph closing, chained paragraph tags)
  • Run targeted PHPUnit tests for the new regression coverage
  • Run the full PHPUnit suite and final validation

This change is Reviewable

@coveralls
Copy link
Copy Markdown

coveralls commented May 2, 2026

Coverage Status

coverage: 96.766% (-0.03%) from 96.791% — copilot/integrate-infection-static-analysis into master

@voku voku marked this pull request as ready for review May 2, 2026 13:06
@voku
Copy link
Copy Markdown
Owner

voku commented May 2, 2026

@copilot fix at least this : 39s
Run mkdir -p build/logs
PHPUnit 9.6.34 by Sebastian Bergmann and contributors.

.................................F............................. 63 / 410 ( 15%)
............................................................... 126 / 410 ( 30%)
........................................F.....F................ 189 / 410 ( 46%)
..FFF......................................F................... 252 / 410 ( 61%)
............................................................... 315 / 410 ( 76%)
............................................................... 378 / 410 ( 92%)
................................ 410 / 410 (100%)

Time: 00:38.946, Memory: 30.00 MB

There were 7 failures:

  1. CoverageBoostTest::testHtmlHelperAndXmlParserCoveragePaths
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'one'
    +'one
    +'

/home/runner/work/simple_html_dom/simple_html_dom/tests/CoverageBoostTest.php:487
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlDomParserTest::testLoadClosingTagOnSelfClosing
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'

    Hey bro, click here

    '
    +'

    +

    Hey bro, click here

    '

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1370
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlDomParserTest::testLoadUpperCase
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'

    hEY BRO, click here

    '
    +'

    +

    hEY BRO, click here

    '

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1453
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlDomParserTest::testNextNonWhitespaceSibling
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    '


    [aaaaa]

  •    <p></p>
    
  •    <div>bbbbbb</div>
    
  •    <div>ccccccccccccccc</div>
    
  •    <div>
           <br>
       </div>
    
  •    <p></p>
    
  •    <p>
           <b>[ddddd]</b>
       </p>'
    

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1867
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlDomParserTest::testPreviousNonWhitespaceSibling
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    '


    [aaaaa]

  •    <p></p>
    
  •    <div>bbbbbb</div>
    
  •    <div>ccccccccccccccc</div>
    
  •    <div>
           <br>
       </div>
    
  •    <p></p>
    
  •    <p>
           <b>[ddddd]</b>
       </p>'
    

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1921
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlDomParserTest::testGetHtmlInner
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'Price $2188'
    +'Price $
    +2188'

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlDomParserTest.php:1940
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

  1. HtmlSerializationRegressionTest::testNodeBackedInnerHtmlPreservesChildrenFormatting
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'beforemiddleafter'
    +'beforemiddle
    +after'

/home/runner/work/simple_html_dom/simple_html_dom/tests/HtmlSerializationRegressionTest.php:72
phpvfscomposer:///home/runner/work/simple_html_dom/simple_html_dom/vendor/phpunit/phpunit/phpunit:106

FAILURES!
Tests: 410, Assertions: 3798, Failures: 7.

Generating code coverage report in Clover XML format ... done [00:00.027]
Error: Process completed with exit code 1.

Copilot AI and others added 7 commits May 2, 2026 13:29
Agent-Logs-Url: https://github.com/voku/simple_html_dom/sessions/0f295050-05df-4c8b-a858-3d14833cbeda

Co-authored-by: voku <264695+voku@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented May 2, 2026

@copilot fix at least this : 39s
Run mkdir -p build/logs
PHPUnit 9.6.34 by Sebastian Bergmann and contributors.
...

Fixed in efbfd5e. The PHP < 8 detached-element serialization helper now strips the synthetic trailing newline that was reintroducing the CI failures, and I added regression coverage for the helper behavior. Local targeted regressions and the full PHPUnit suite are green. No UI changes.

Copilot AI requested a review from voku May 2, 2026 13:33
Copilot finished work on behalf of voku May 2, 2026 13:33
Copilot finished work on behalf of voku May 2, 2026 20:07
@voku
Copy link
Copy Markdown
Owner

voku commented May 2, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configuration for the Infection mutation testing tool and refactors HTML node serialization to improve compatibility with PHP versions older than 8.0. The refactoring includes a new method, serializeElementNodeForPhpLt8, which handles element serialization by stripping unnecessary wrapper tags and trailing newlines. Feedback suggests reordering the removal of tags in this new method to ensure that empty body fragments are correctly removed before individual tags are stripped, preventing potential dangling tags.

Comment thread src/voku/helper/HtmlDomParser.php Outdated
Comment on lines +1250 to +1252
$content = (string) \preg_replace('/^<body[^>]*>/i', '', $content);
$content = (string) \preg_replace('/<\/body>\s*$/i', '', $content);
$content = \str_replace('<body></body>', '', $content);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The str_replace for the empty <body> tag should be performed before the regex-based stripping of <body> and </body> tags.

When serializing a <head> node on PHP < 8, libxml often appends an empty <body></body> at the end of the document. If preg_replace('/<\/body>\s*$/i', '', $content) runs first, it will strip the closing </body> tag, leaving a dangling <body> tag that str_replace('<body></body>', ...) will then fail to match and remove. Moving the literal replacement to the top of this block ensures it correctly catches the empty body fragment before the individual tags are stripped.

                $content = \str_replace('<body></body>', '', $content);
                $content = (string) \preg_replace('/^<body[^>]*>/i', '', $content);
                $content = (string) \preg_replace('/<\/body>\s*$/i', '', $content);

@voku
Copy link
Copy Markdown
Owner

voku commented May 2, 2026

@copilot Check if this feedback from another llm is valid or just AI slop: Code Review
This pull request introduces a configuration for the Infection mutation testing tool and refactors HTML node serialization to improve compatibility with PHP versions older than 8.0. The refactoring includes a new method, serializeElementNodeForPhpLt8, which handles element serialization by stripping unnecessary wrapper tags and trailing newlines. Feedback suggests reordering the removal of tags in this new method to ensure that empty body fragments are correctly removed before individual tags are stripped, preventing potential dangling tags.

src/voku/helper/HtmlDomParser.php
Comment on lines +1250 to +1252
$content = (string) \preg_replace('/^<body[^>]>/i', '', $content);
$content = (string) \preg_replace('/</body>\s
$/i', '', $content);
$content = \str_replace('', '', $content);
@gemini-code-assist
gemini-code-assist Bot
3 minutes ago
high

The str_replace for the empty tag should be performed before the regex-based stripping of and tags.

When serializing a node on PHP < 8, libxml often appends an empty at the end of the document. If preg_replace('/</body>\s*$/i', '', $content) runs first, it will strip the closing tag, leaving a dangling tag that str_replace('', ...) will then fail to match and remove. Moving the literal replacement to the top of this block ensures it correctly catches the empty body fragment before the individual tags are stripped.

            $content = \str_replace('<body></body>', '', $content);
            $content = (string) \preg_replace('/^<body[^>]*>/i', '', $content);
            $content = (string) \preg_replace('/<\/body>\s*$/i', '', $content);

@voku Reply...

Copy link
Copy Markdown
Author

Copilot AI commented May 2, 2026

@copilot Check if this feedback from another llm is valid or just AI slop: Code Review
This pull request introduces a configuration for the Infection mutation testing tool and refactors HTML node ser...

I checked it, and that feedback is not valid. The current order in efbfd5e is intentional: stripping the opening/closing <body> tags first and then removing the exact <body></body> fragment still collapses an empty body to '', including <body></body>\n. If we move str_replace('<body></body>', '', ...) first, the <body></body>\n case becomes just "\n", which recreates the newline regression instead of fixing it.

Copilot finished work on behalf of voku May 2, 2026 20:24
@voku voku requested a review from Copilot May 9, 2026 21:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens CI quality gates by adding mutation testing (Infection) + PHPStan checks on PHP 8.3, and adjusts PHP < 8 HTML serialization to avoid libxml-injected formatting/newline regressions (including template round-trip behavior).

Changes:

  • Add a new serialization regression test covering trailing-newline behavior in PHP < 8 element serialization.
  • Refactor PHP < 8 element node serialization to use whole-document saveHTML() + wrapper stripping, plus trailing newline removal.
  • Add Infection configuration and wire Infection + PHPStan into CI on PHP 8.3 (with an increased workflow timeout).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/HtmlSerializationRegressionTest.php Adds a regression test for PHP < 8 element serialization output stability (no trailing newline artifacts).
src/voku/helper/HtmlDomParser.php Updates node serialization flow for PHP < 8 by introducing serializeElementNodeForPhpLt8() and routing element serialization through it.
infection.json5 Introduces Infection configuration (phpstan integration, logging).
.github/workflows/ci.yml Adds PHPStan + Infection runs on PHP 8.3 and increases job timeout to accommodate the additional workload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1215 to +1261
/**
* On PHP < 8.0, saveHTML($node) injects formatting newlines for detached
* block-level elements, so serialize a temporary whole document instead.
*
* @param \DOMElement $node
*
* @return string
*/
private function serializeElementNodeForPhpLt8(\DOMElement $node): string
{
$document = new \DOMDocument('1.0', $this->getEncoding());
$document->preserveWhiteSpace = true;
$document->formatOutput = false;

$importedNode = $document->importNode($node, true);
// @phpstan-ignore instanceof.alwaysTrue (importNode() returns DOMNode here)
if (!$importedNode instanceof \DOMElement) {
return '';
}

$document->appendChild($importedNode);

$content = $document->saveHTML();
if ($content === false) {
return '';
}

$content = (string) \preg_replace('/^<!DOCTYPE[^>]+>\s*/i', '', $content);

$tagName = \strtolower($importedNode->tagName);
if ($tagName !== 'html') {
$content = (string) \preg_replace('/^<html[^>]*>/i', '', $content);
$content = (string) \preg_replace('/<\/html>\s*$/i', '', $content);

if ($tagName !== 'body') {
$content = (string) \preg_replace('/^<body[^>]*>/i', '', $content);
$content = (string) \preg_replace('/<\/body>\s*$/i', '', $content);
$content = \str_replace('<body></body>', '', $content);
}
}

if (\substr($content, -1) === "\n") {
$content = \substr($content, 0, -1);
}

return $content;
}
Comment on lines +75 to +101
public function testSerializeElementNodeDoesNotAppendTrailingNewline()
{
$document = HtmlDomParser::str_get_html(
'<div><span>one</span><br><p>two</p><template id="card"><section><h2>Title</h2><p>Body</p></section></template></div>'
);

$serializeElementNodeForPhpLt8 = new \ReflectionMethod(HtmlDomParser::class, 'serializeElementNodeForPhpLt8');
if (\PHP_VERSION_ID < 80100) {
// This version check is only for Reflection behavior: private method
// access still needs setAccessible() when PHP_VERSION_ID < 80100
// (PHP 8.0 and earlier).
$serializeElementNodeForPhpLt8->setAccessible(true);
}

$spanHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('span')->getNode());
$brHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('br')->getNode());
$pHtml = $serializeElementNodeForPhpLt8->invoke($document, $document->getElementByTagName('p')->getNode());

static::assertSame(
'<span>one</span><br><p>two</p>',
$spanHtml . $brHtml . $pHtml
);
static::assertSame(
'<template id="card"><section><h2>Title</h2><p>Body</p></section></template>',
$serializeElementNodeForPhpLt8->invoke($document, $document->findOne('template')->getNode())
);
}
Comment on lines 1188 to +1192
private function serializeNode(\DOMNode $node): string
{
// For script/style on PHP < 8.0 use ownerDocument to avoid fresh-doc
// libxml injecting "\n" inside raw-text content.
$useOwnerDoc = \PHP_VERSION_ID < 80000
&& $node instanceof \DOMElement
&& \in_array(\strtolower($node->tagName), ['script', 'style'], true);

if (!$useOwnerDoc) {
$document = new \DOMDocument('1.0', $this->getEncoding());
$document->preserveWhiteSpace = true;
$document->formatOutput = false;

$importedNode = $document->importNode($node, true);
// @phpstan-ignore instanceof.alwaysTrue (importNode() returns DOMNode here)
if (!$importedNode instanceof \DOMNode) {
return '';
}
if (\PHP_VERSION_ID < 80000 && $node instanceof \DOMElement) {
return $this->serializeElementNodeForPhpLt8($node);
}
@voku
Copy link
Copy Markdown
Owner

voku commented May 10, 2026

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Author

Copilot AI commented May 10, 2026

@copilot apply changes based on the comments in this thread

Applied in 2acb658. I extracted the shared PHP < 8 wrapper-stripping logic, updated the serializeNode() docblock to match the current behavior, and guarded the helper-specific regression test so it only runs on PHP < 8 where that private helper is actually used.

Copilot finished work on behalf of voku May 10, 2026 07:44
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants